Search CORE

95 research outputs found

A Note on Linear Time Algorithms for Maximum Error Histograms

Author: Guha Sudipto
Shim Kyuseok
Publication venue: ScholarlyCommons
Publication date: 01/07/2007
Field of study

Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, e.g., V-Optimal, use error measures which are the sums of a suitable function, e.g., square, of the error at each point. Although the best-known algorithms for solving these problems run in quadratic time, a sequence of results have given us a linear time approximation scheme for these algorithms. In recent years, there have been many emerging applications where we are interested in measuring the maximum (absolute or relative) error at a point. We show that this problem is fundamentally different from the other traditional nonl∞ error measures and provide an optimal algorithm that runs in linear time for a small number of buckets. We also present results which work for arbitrary weighted maximum error measures

ScholarlyCommons@Penn

Special Section on the International Conference on Data Engineering 2015

Author: Gehrke Johannes
Shim Kyuseok
Wolfgang Lehner
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/01/2023
Field of study

The papers in this special section were presented at the 31st International Conference on Data Engineering that was held in Seoul, Korea, on April 13-17, 2015. 17, 2015

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Message from the ICDE 2015 Program Committee and general chairs

Author: Cha Sang Kyun
Gehrke Johannes
Lehner Wolfgang
Lohman Guy
Shim Kyuseok
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/01/2023
Field of study

Since its inception in 1984, the IEEE International Conference on Data Engineering (ICDE) has become a premier forum for the exchange and dissemination of data management research results among researchers, users, practitioners, and developers. Continuing this long-standing tradition, the 31st ICDE will be hosted this year in Seoul, South Korea, from April 13 to April 17, 2015. It is our great pleasure to welcome you to ICDE 2015 and to present its proceedings to you

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

From Large Language Models to Databases and Back: A discussion on research and education

Author: Amer-Yahia Sihem
Bonifati Angela
Chen Lei
Li Guoliang
Shim Kyuseok
Xu Jianliang
Yang Xiaochun
Publication venue
Publication date: 02/06/2023
Field of study

This discussion was conducted at a recent panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023), held April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM (ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was moderated by Lei Chen and Xiaochun Yang. The discussion raised several questions on how large language models (LLMs) and database research and education can help each other and the potential risks of LLMs.Comment: 7 pages, 2 figures, the Panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023

arXiv.org e-Print Archive

Mining Optimized Support Rules for Numeric Attributes

Author: Kyuseok Shim
Rajeev Rastogi
Publication venue: IEEE Computer Society Press
Publication date: 01/01/1999
Field of study

In this paper, we generalize the optimized support association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. For rules containing a single numeric attribute, we present a dynamic programming algorithm for computing optimized association rules. Furthermore, we propose a bucketing technique for reducing the input size, and a divide and conquer strategy that improves the performance significantly without sacrificing optimality. Our experimental results for a single numeric attribute indicate that our bucketing and divide and conquer enhancements are very effective in reducing the execution times and memory requirements of our dynamic programming algorithm. Furthermore, they show that our algorithms scale up almost linearly with the attribute's domain size as well as the number of disjunctions. 1 Introduction Association rules, introduced in [AIS93], provide a useful mechanism for discovering correlations among the underlying data. In it..

CiteSeerX

Crossref

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Author: Kyuseok Shim
Rajeev Rastogi
Publication venue
Publication date
Field of study

Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting". In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second "pruning" phase with the initial "building" phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. In order to make this determination for a node, before it is expanded, PUBLIC computes a lower bound on the minimum cost subtree rooted at the node. This estimate is then used by PUBLIC to identify the nodes that are certai..

CiteSeerX